import requests
from bs4 import BeautifulSoup
# get the html
= 'https://www.aer.ca/providing-information/data-and-reports/statistical-reports/st53'
url = requests.get(url)
r = r.text
data # parse the html
= BeautifulSoup(data, 'html.parser')
soup # find all the hyperlinks
= soup.findAll('a', href = True)
links # get the list of xls files only
= [link for link in links if link.get('href').endswith('xls')]
xls_links # download the xls files and save them in the current directory
for link in xls_links:
= link.get('href')
xls_url = xls_url.split('/')[-1]
xls_name = requests.get(xls_url)
xls with open(xls_name, 'wb') as f:
f.write(xls.content)
Alberta In-situ Oilsands Production Analysis using Python - Part I
Data extraction from AER website
Disclaimer
The information provided in this blog post is for educational purposes only. Commercial use of this information is prohibited. For information on the copyright and permission to reproduce, please visit the AER Copyright and Disclaimer page at AER Copyright and Disclaimer.
The author assumes no responsibility for any damage or loss resulting from the use of the information provided in this blog post.
Introduction
The Alberta Energy Regulator (AER) provides monthly injection and production data for all in-situ oilsands projects in Alberta on its website at Alberta Energy Regulator. The data is available in several spreadsheets and can be downloaded from the ST53: Alberta In Situ Oil Sands Production Summary. The annual data is reported in separate files in .xls
format. Each file contains six different sheets with information about bitumen, water, steam, wells, SOR, and WSR.
Downloading the data
To download the data using Python, the requests
and BeautifulSoup
libraries are used. The following code downloads all the .xls files from the AER website and saves them in the current directory:
Listing all the saved xls files
import os
# filter only .xls files
= [file for file in os.listdir(os.getcwd()) if file.endswith('.xls')]
xls_files xls_files
['ST53_2010-12.xls',
'ST53_2011-12.xls',
'ST53_2012-12.xls',
'ST53_2013-12.xls',
'ST53_2014-12.xls',
'ST53_2016-12.xls',
'ST53_2017-12.xls',
'ST53_2018-12.xls',
'ST53_2019-12.xls',
'ST53_2020-12.xls',
'ST53_2021-12.xls',
'ST53_2022-12.xls',
'ST53_Current.xls']
In the next post, I will show how to read the data from the .xls files and prepare the datasets for further analysis. This analysis will provide insights into the production trends and patterns of in-situ oilsands projects in Alberta.